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Abstract 

Homologous recombination is an important operator in the evolution of bi- 
ological organisms. However, there is still no clear, generally accepted un- 
derstanding of why it exists and under what circumstances it is useful. In 
this paper we consider its utility in the context of an infinite population 
haploid model with selection and homologous recombination. We define util- 
ity in terms of two metrics - the increase in frequency of fit genotypes, and 
the increase in average population fitness, relative to those associated with 
selection only. Explicitly, we exhaustively explore the eight-dimensional pa- 
rameter space of a two-locus two-allele system, showing, as a function of 
the landscape and the initial population, that recombination is beneficial in 
terms of our metrics in two distinct regimes: a landscape independent regime 
- the search regime - where recombination aids in the search for a fit geno- 
type that is absent or at low frequency in the population; and the modular 
regime, associated with quasi-additive fitness landscapes with low epistasis, 
where recombination allows for the juxtaposition of fit "modules" or Building 
Blocks. Thus, we conclude that the ubiquity and utility of recombination is 
intimately associated with the existence of modularity in biological fitness 
landscapes. 
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1. Introduction 



The existence, prevalence and utility of genetic recombination is an old 
and enduring puzzle of biology Seminal works, such as 0, H, 0, \^ among 
others, have provided theoretical justifications that add to a long list of pu- 
tative mechanisms that may account for recombination's enduring role in 
most higher species. Classic jsj and more contemporary reviews 0, 0] on the 
subject summarize many of these candidates. Even though the number of po- 
tential explanations is large, none of them has been found compelling enough 
to have settled the debate. Additionally, some older propositions have come 
under more scrutiny thanks to improved experimental data 0, [Toj], and it has 
even been suggested that the hidden value of sexual recombination might not 
even lie mainly in the improvement of genetic variability or fitness, or in its 



defining properties. As stated in [111 ]: ". . .it is generally accepted that the 



long-term maintenance and ubiquity of Eukaryotic sex cannot be explained as 
an approximate consequence of the inherent properties of sex itself.", a posi- 
tion exemplified in 121 . where it is suggested that recombination might serve 



mainly as a stabilizer of mitosis, and that any drawn benefit regarding genetic 
inheritance is circumstantial. The ple thora of proposed models ranges from 
simple ones that are case based 0,1131 14 1 , to sophisticated simulations that 
incorporate many-locus, multiple allele genotypes, dynamic recombination 



rates and sites [15|, [16|, llil] , different levels and t ype s of epistasis, mutation 



complex and variable fitness landscapes, etc. [18|, llOj. Studies typically focus 
on measuring the effects of recombination on average fitness, but others con- 
centrate on other quantifiable benefits; jsj, for example, reports the virtues 



of recombination regarding the exploration of the fitness landscape, in [19 
the change over generations of the genetic linkage distance between epistatic 
units is discussed and 



20 



focuses on the mean time for a beneficial epistatic 
group of two alleles to appear on the same gamete with and without recom- 
bination. For a review on the experimental backing or counterevidence to 
theoretical explanations for the prevalence of recombination see (ilj . 

Of course, if we are to understand the benefits of recombination in the 
context of a mathematical model, a requirement is that the model itself cap- 
tures the very mechanisms by which it is useful in the first place. This then 
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leads us to ask if the apparent inability to find an agreed universal advantage 
for recombination is due to the fact that the considered models are incapable 
of modeling the benefits - a defect of the model - or, rather, that the benefits 
are not transparent in the analyses of the models that have been studied. 
If the models themselves are inadequate then new models with new features 
must be developed. On the contrary, if the analyses themselves are at fault, 
one must understand why. In this paper we will start with the hypothesis 
that standard population genetics models are capable of showing univer- 
sal mechanisms by which recombination is useful. However, by restricting 
to a simple two-locus two-allele model we will be able to exhaustively study 
the full eight-dimensional parameter space (four landscape parameters, three 
population parameters and the recombination probability) of the model. We 
will show that the reason why universal mechanisms have been difficult to 
identify is twofold: that the benefits are more visible in terms of Building 
Blocks (subsets of loci defined by the recombination distribution) not geno- 
types, as in standard analyses, and that the benefits of recombination are 
particularly associated with quasi- additive fitness landscapes with low addi- 
tive epistasis. Such landscapes we term "modular' and thus, we believe, the 
results of this paper link two fundamental concepts in biology - the utility 
and ubiquity of recombination with the existence of modularity. 

2. Recombination - a Building Block Perspective 

In this section we introduce the theoretical framework and the chief di- 
agnostics we will use to examine the utility of recombination. As we are 
interested here in the interaction of selection and homologous recombination 
we will omit mutation. We will consider the evolutionj of a population of 
length £ haploid sequences governed by the equation |22[ 

(P r (t + 1)) = Pi(t) -p c J2Pc(m)Mm,t) (1) 

m 

where (Pj(t + 1)) is the expected frequency of genotype / at generation t+ 1. 
In the first term on the right-hand side Pj{t) is the selection probability for 
the genotype I. For proportional selection, which is the selection mecha- 
nism we will consider here, Pj{t) = (fi/f(t))Pi(t), where fi is the "survival" 



We will restrict attention here to a generational model with no overlap. 
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fitness^] of genotype /, / (t) is the average population fitness in the tth gen- 
eration and Pi(t) is the proportion of genotype / in the population. In the 
second term, the recombination distribution, p c (m), is modeled using the 
concept of a recombination mask m = m\m2 ■ ■ ■ me, which is such that, if 
rrii = 0, the ith locus of the offspring is taken from the ith locus of the 
first parental sequence, while, if rrii = 1 it is taken from the ith locus of the 
second parental sequence. Finally, A/(m, t) is the Selection-weighted linkage 
disequilibrium (SWLD) coefficient [23] for the genotype /. Explicitly, 

A x (m, t) = (P'M - V K (m)P'j(t)P' K (t)) (2) 

JK 

where Xj JK (m) = 0, 1 is an indicator function that represents the condi- 
tional probability that the offspring genotype / is formed given the parental 
genotypes J and K and the mask m. For example, for two loci, I = 2, with 
binary alleles, a and b, A aa aa ' fe6 (01) = 0, while A aa afe ' fca (01) = 1. The contri- 
bution of a particular mask depends, as we can see, on all possible parental 
combinations. In this sense, Aj(m, t), in the space of genotypes, is an exceed- 
ingly complicated function. In the case of diploids, the SWLD coefficient is 



equivalent to the functions D$ of Nagylaki [24| and 0/ described in j25[. For 



a given target genotype and mask, X I JK (m) is a matrix on the indices J and 
K associated with the parents. For binary alleles, for every mask there are 
I 1 x 2 £ possible combinations of parents that need to be checked to see if they 
give rise to the offspring /. Nevertheless, only 2 e elements of the matrix are 
non-zero. The question is: which ones? Although, Aj(m,t), or equivalently 
Di or 0/, gives a complete summary of the effect of recombination in a given 
generation it is an exceedingly complicated function to analyze. However, 
the complication of \ I JK {m) in terms of genotypes is just an indication of 
the fact that the latter are not a natural basis for describing the action of 
recombination. 



A more appropriate basis is the Building Block Basis (BBB) [23|, 126 
wherein only the Building Block (BB) schemata that contribute to the for- 
mation of a genotype / enter. In this case |f| 

Aj(m,t) = (Pi(t) - PiJ^PUt)) (3) 



2 By survival fitness, in the absence of factors such as fertility, differences in mating 
success etc., we mean viability, the probability to reach reproductive age, in distinction to 
absolute fitness which measures the overall reproductive success of a type. 

3 Equation |T]) with the substitution of equation ([3]) has a long history, starting with the 



4 



where Pj (t) is the selection probability of the BB I m and I m is the com- 
plementary block such that I m U Im — I. Both blocks are uniquely specified 
by the associated recombination mask, m = m\m,2 ■ ■ ■ mi. For instance, for 
three loci, i — 3, if / = aaa and m = 001 then I m = aa* and I m = * * a, 
where * is the canonical "wildcard" symbol, familiar from Evolutionary Com- 
putation, indicating that the corresponding locus has been summed over 
thus leading to marginal probabilities. Thus, the probability for the schema 
X1X2* is P(xiX2*) = J2x 3 =o 1 P(xiX2Xs). The selection probability for the 
BB schema I m is Pj (t) = (fi m (t)/f(t))Pi m (t), where the fitness of I m is 
firn(t) = Y2iei fiPi(t)/ J2iei an d depends on the actual composition 

of the population. It is important to emphasize that the SWLD is distinct 
to the well-known linkage disequilibrium coefficient, Di(m), which depends 
only on the allele frequencies and the crossover mask m, and not on the fit- 
ness landscape. In the case of a flat fitness landscape, A/ = D I: but not 
otherwise. In particular, a population at linkage equilibrium with Dj = 
does not necessarily satisfy Aj = 0. Selection effects generally move the sys- 



tem away from the Geiringer or Robbins manifold |22|, [30[ , which is the set 
of points in the space of populations defined by Dj = 0. In terms of BBs, 

D I (m,t) = P I (t)-P Im (t)P Ifn (t) (4) 

with Pi m (t) and Pi m (t) being the frequencies, not the selection probabilities, 
of the BBs I m and I m . Therefore, in linkage equilibrium Di(m, t) = implies 
Pi(t) = Pi m (t)Pi-(t), i.e., the probability to find any genotype I is the same 
as the product of the probabilities to find its constituent BBs. Thus, at 
linkage equilibrium the SWLD coefficient is given by 

Aj(m,t) = (/,/(*) - fi m {t)fiM^-2 (5) 

Note that the structure of X I JK (m) is particularly simple when both J 
and K are BB schemata. For a given / and m one unique BB, I m , is picked 



seminal work of Hilda Geiringer (27j who derived a version of the equation for a diploid 
population without selection. Versions of the equation were then rederived and discussed 
in [28| , who used it to discuss the performance of recombinative Genetic Algorithms using 
Price's theorem, showing that schemata were a natural consequence of recombination; 



and in [22|, |29| where the Building Block Hypothesis was examined and it was discussed 
under what circumstances recombination led to an increase in the effective fitness of a 
given genotype. Also, in the latter the relation to the concept of coarse graining was 
emphasized and discussed. 
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out. The second BB I m then enters as the complement of I m in I. This 



means that X T 



JK, 



m, 



is skew diagonal on the indices J and K, with only 



one non-zero element on that skew diagonal for a given m and I. At a 
particular locus of the offspring, the associated allele is taken from the first 
or second parent according to the value of mj. If it is taken from the first 
parent, then the corresponding allele in the second parent is immaterial. As 
seen above, this fact is represented by the normal schema wildcard symbol 
*. It is important to emphasize that the BBs form an alternative basis 
to that of the genotypes. This means that genetic dynamics can not only 
potentially be described without any reference to genotypes but also that 
with the dynamics of the BBs the dynamics of any and all genotypes can be 
derived. For instance, for two loci with binary alleles, a and b, the possible 
genotypes are bb, ba, ab and aa. The corresponding BBs are aa, a*, *a and 
**, where we arbitrarily chose the genotype aa as the type around which to 
develop the BBB. The relationship between the two bases is given by 
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where 
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(7) 



/ 



is the coordinate transformation matrix that transforms from one basis to an- 
other. As bases, the genotype and BBB have equivalent dynamics. However, 
the dynamics of recombination is fundamentally simpler in the BBB due to 
the immense simplification of \ I JK (m) in the latter. In other words, just 
as Walsh/Fourier modes 31, 32, 0, 3J] are the natural basis for describing 
mutation, so BB schemata are the natural basis for describing homologous 
recombination. They are the natural effective degrees of freedom of any 
genetic system with recombination. 

From Equation ([T|) for the time evolution of the probability distribution 
for the system, we may derive the time evolution of any derived quantity, 
such as the average population fitness, which is given by 



(^+i)> = £-^/(*)-pcJ>i 



(8) 
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2.1. Why Recombination? 

As mentioned in the introduction, a great amount of work has been done 
on trying to understand why recombination is ubiquitous. Here, rather than 
trying to understand the potential benefits of homologous recombination at 
the most general phenomenological or conceptual level, we will restrict at- 
tention to what we may deduce purely from its mathematical representation 
in equation ([1]). Of course, it may be that the benefits of recombination are 
not manifest in this model. However, given that the model is the generally 
accepted framework for classical population genetics it behooves us to at 
least use it as a starting point. Further, we will analyze the model concen- 
trating on two simple metrics for measuring the benefits of recombination, 
asking: i) under what circumstances can recombination lead to the genera- 
tion of a higher frequency of a fit offspring than would be the case with only 
selection? and, relatedly, ii) under what circumstances can recombination 
lead to a larger increase in the average population fitness relative to selection 
only? From equations ([!]) and flS} we see that it is the SWLD coefficient that 
quantifies the effect in both cases. 

From equation (pQ), we can see that if Aj(m) < then recombination 
leads, on average, to a higher frequency of the genotype / than in its absence. 
In other words, in this circumstance, recombination is giving you more of / 
than you would have otherwise. On the contrary, if A/(m) > then the 
converse is true, recombination provides less of the genotype of interest than 
would be the case in its absence. With this is mind, as mentioned, we will 
consider two complementary metrics to evaluate the utility of recombination 
in time: the change in number of optimal genotypes from one generation 
to the next and the change in average population fitness. In the infinite 
population limit, the former is given by 

A Pl (t) = Pj(t + 1) - P z (t) = {Piit) -Pj{t)) -p c ^p c (m)Aj(m,t) (9) 

m 

For fitness-proportional selection, 

Ap x (t) = (J^ - l) Pj(t) -p c 5> c (m)A 7 (m,f) (10) 

The first term on the right-hand side is the increase in the number of op- 
timal genotypes due to the effect of selection only and the second term the 
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contribution due to recombination. Now passing to the average population 
fitness we have in the infinite population limit 

s f -(t) = (T 2 -f 2 )-PcJ2pc( m )J2^ m ^ ( n ) 

m I 

where, once again, the first term on the right-hand side is the contribution 
from selection only and corresponds to Fisher's Fundamental Theorem, while 
the second term is the contribution from recombination. 

For both metrics the effect of recombination is controlled by the sign of 
A/(m). For increasing the frequency of a fit genotype I relative to the case of 
selection only, we see that this will be the case, passing from generation t to 
generation t + 1, if and only if A/(m, t) < with the sign and magnitude of 
A/(m, t) fixed completely by the fitness landscape and the actual population. 
So, whether recombination is beneficial or not passing from one generation to 
another, in this sense, is equally fixed by the fitness landscape and the actual 
population. Similarly, the increase in the average population fitness from one 
generation to the next, relative to selection only, is controlled by the fitness 
weighted average of A/(m, t) and, hence, once again, by the fitness landscape 
and the current population. Now, although we will consider the dynamics of 
recombination across multiple generations it is important to emphasize that, 
at least within the confines of the infinite population approximation, the po- 
tential behaviors are captured by the one-generation equation if we consider 
the space of all possible populations given, that in the infinite population 
limit, the behavior across multiple generations is found by simply iterating 
the one generation equation. For example, if we iterate the equation and find 
that A/(m, t) < across each of the iterated generations then we can con- 
clude that recombination led to a higher incidence of / relative to selection 
only over each and every generation. 

So, once again, we are led to ask first: When is A/(m, t) < 0? The answer 
is when Pj(t) < P' Im (t)Pj_(t), i.e., the probability to select the genotype / 
is less than the probability to select its component BB schemata, where the 
action of recombination is modeled to be such that the blocks are selected 
independently. There are several distinct regimes in which A/(m, t) < 0, 
which we will explore further and which categorize the different conditions 
under which homologous recombination can be deemed useful. First, there 
is the regime in which Pi(t) = 0, i.e., the genotype I is non-existent, or at 
a very small frequency, in the actual population. In this case A/(m, t) < 
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directly and then, remembering that we are neglecting the effects of muta- 
tion, recombination is the only mechanism by which the genotype / can be 
generated. This regime emphasizes the search property of recombination, 
independent of the fitness landscape. 

In general though, as emphasized, the effects of recombination depend 
on the fitness landscape. Previous studies 35( have provided evidence that 
recombination is particularly beneficial in additive or modular fitness land- 
scapes. A simple way to see this is to eliminate any bias that comes from 
a particular choice of initial population and assume equal proportions for 
all genotypes. In this situation, it can be shown that A/(m,t) = (fif(t) — 
fi m (t)fi fn (t))/2 e f(t) < for any m that does not cut an epistatic link be- 
tween loci. For instance, for a genotype I1I2 ■ ■ ■ h, if // = J^i fin the 
landscape is additive, then Aj(m, t) < for any m. This result is also valid 
when the Ii correspond to multiple loci when recombination does not cut 
any epistatic link between the loci. This is the case for a modular landscape, 
where loci divide up into disjoint sets with epistasis between the loci in a 
set but not between sets. The benefit of recombination in this case is that 
it efficiently increases the number of fit modules in an offspring genotype 
relative to the numbers present in the parental types. On the contrary, for 
a highly epistatic fitness landscape, such as "needle-in-a-haystack"[f| one can 
show that A/(m, t) > for all m. 

One may argue, of course, that proving that A/(m, t) < over one gener- 
ation for a particular choice of population and in particular fitness landscapes 
does not correspond to a "universal" mechanism for explaining the benefits 
of recombination. That is why in this paper we consider the general situa- 
tion of an arbitrary fitness landscape and an arbitrary population, as well as 
considering multiple generations. To consider such generality, however, the 
price we must pay is to restrict to a small number of loci. 

So, we would argue that the two most significant, and potentially related, 
regimes in which recombination is beneficial are: i) the search regime, where 
recombination searches for fit genotypes that presently either do not exist 
or are at very low frequency in the population; and ii) the modular regime, 
where recombination allows for the juxtaposition of distinct fit modules in 



4 This landscape corresponds to one optimal genotype with fitness f n , while other types 
have equal fitness, fh- It has been used extensively in molecular evolution in the context of 
the Eigen model [361 ] . where the dynamics is naturally understood in terms of quasi-species. 
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different parental types into an even fitter offspring. Of course, in the search 
regime the question arises as to whether recombination is more efficient than 
mutation. This will depend on the Hamming or edit distance between parents 
and offspring. An example, that we will not consider in more detail, that 
exhibits the benefits of recombination over mutation in generating innovation, 
is the development of antibiotic resistance in bacteria through horizontal 
gene transfer. Generically, it will be the case that the Hamming or edit 
distance between the original parental sequences, say bacterium and virus, 
and the offspring sequence, bacterium with viral gene, will be potentially 
large. In other words, the difference between the initial and final sequences 
is not a single-nucleotide, or even a small number of them. In this sense, 
recombination- lik<S events are the only way to generate innovation that is 
associated with large genomic changes, "large" meaning that the Hamming 
or edit distance between parental and offspring sequences is large. 



3. Modularity and Fitness Landscapes 

Before considering our explicit model we wish to discuss the concept of 
modularity in terms of the fitness landscape. For simplicity, we restrict to 
binary alleles X{ = 0, 1, where % refers to the locus. We will consider two 
representations of the fitness function, a direct one where we use the f x = 
fxix%...x t directly and another one where the fitness function can be written 
as an expansion of the form 

l £-1 £ 

fx = + ^ ] ^ij X i\ + ^ ] ^ ] ^i-i_i 2 X ii X i2 

il=l i 1= li 2 =ii+l 
1-2 l-\ I 

11=1 «2=il+l 13=«2 + 1 

in) 

where F^ li2 __ A represents an epistatic interaction between n alleles located at 
loci i\ , %2 , . . • , i n and Xi n = 0, 1. The advantage of this latter representation 



(12) 



5 By "recombination-like" we mean any genomic change where one or more sub- 
sequences in one or more parental sequences are transferred to an offspring sequence. This 
is termed "generalized recombination" in 37[ and comprehends unequal crossing over, 
transposition, translocation and related operations, as well as homologous recombination. 
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is that the degree of epistasis between different loci and alleles can be simply 
deduced. 

Any landscape that contains only Fourier components of 0(n) is said to 
be an elementary landscape of order n. For instance, a completely additive 
landscape has a fitness function of the form 

t 

fx ^ F{X% 
i=i 

and is therefore an elementary landscape of order one, as all Fourier compo- 
nents other than order one are zero. This is a consequence of the fact that 
there are no epistatic interactions between loci. Similarly, a multiplicative 
landscape, where 

Jx — 1 i 1 i 2 ...ii Jj n- h *2 ■ ■ ■ -^n 

is an elementary landscape of order £, as all Fourier components other than 
order £ are zero, there being epistatic interactions of order £ between the 
loci but no others. Other landscapes will be intermediate between these ex- 
tremes. The "needle-in-a-haystack" landscape, where the "needle" sequence 
has fitness f n and the "hay" sequences fitness fh, is such that no Fourier co- 
efficients are zero and there are epistatic interactions between all subgroups 
of loci. 

A particularly interesting class of landscapes are those of "modular" type, 
where the loci of a genotype partition into £ m disjoint subsets^], modules, 
si, S2, ■ ■ ■ se m , and the landscape can then be decomposed as the sum of the 
individual fitnesses of these disjoint subsets so that the fitness of a genotype 
is given by 

= ( 13 ) 

s t =l 

This modularity will obviously leave an imprint in the expansion ( II 2p . For 
instance, if each module consists of £ m loci and there is no epsitasis between 
the modules then in f fl2|) we will have , = for n > £ m . 

As mentioned previously, a full analysis for £ loci with arbitrary landscape 
and population is prohibitively difficult, so here we will focus on the case of 



intuitively these modules will be formed by contiguous loci such as is natural for an 
exon or gene. 
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two loci, as in this case we can study in the context of an exactly solvable 
model the different regimes under which recombination can be beneficial. So, 
restricting ourselves to the case of two loci, i = 2, we have 

2 

= f i0) + E fn X *i + f^ X ^ ( 14 ) 
ti=l 

For an additive (modular) landscape = 0. For a multiplicative landscape 
f(o)f[f = fPfV*. For a NIAH landscape ff ] = / 2 {1) = 0. 

4. Recombination in an exact two-locus model 

4-1. Analytic results 

Clearly, trying to characterize the efficacy of recombination quantita- 
tively, and in detail, is prohibitively complicated. As we saw in section 
|2l however, within the confines of the model we are considering, it can be 
characterized using only one fundamental function: the SWLD coefficient. 
The SWLD coefficient, though, depends not only on the recombination dis- 
tribution, but also on the full fitness landscape and the current state of the 
population. In other words it is a function of a large number of parameters. 
To circumvent this problem we consider the case of two loci and calculate the 
SWLD coefficient as a function of the fitness landscape and the population. 
Note that by two loci here we do not necessarily imply that they represent 
"genes" . They may represent any two structural units, such as exons, introns 
or other motifs, or nucleotides themselves, that can be separated or recom- 
bined by crossover and which can be characterized, as an approximation, by 
a fitness landscape that is independent of the rest of the genome. 

For two loci all genotypes can be characterized by a multi-index I = ij, 
with i,j e {0, 1, ... , C}, where C + 1 is the cardinality of the alphabet that 
labels the loci, or alleles in the case of genes. For i = 2, there is only one 
non-trivial mask0 m = 01, and its conjugate, that lead to the BBs i* and 
*j. The sum over masks in the general expression for the SWLD coefficient 
is thus reduced to only one term: 

Xj r:, ~ PLHj = P*-{Pu + P»)(PS j + (is) 



7 The masks m = 00 and 11 correspond to cloning, where both offspring loci come from 
a single parent. 
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Direct evaluation shows that 



Aij = A^ = -A l4 = -Ajj = A, (16) 

and thus the evolution equations in the two-allele, two-locus problem are: 

P lJ (t + l)=P> J (t)- Pc A (17) 

The whole state of this system can be characterized by 3 (= 4 — 1) frequencies 
that are naturally represented in a three dimensional simplex. Figure [T] 
shows typical population trajectories in the two-locus, two-allele system for 
a generic landscape, with x — 11 arbitrarily taken as the optimum genotype 
and several different initial population ratios. 




Figure 1: Geiringer manifold (colored) and some trajectories for some random initial 
populations. The system's convergence to dominance of the optimal genotype is indicated 
by the arrow. 
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4-1.1. Muller's Ratchet. 

Muller's ratchet [5| 0, and variations thereof, have been frequently invoked 
in considerations of the potential benefits of recombination. Essentially, the 
argument is that recombination increases the evolvability of a population by 
allowing beneficial mutations on different genomes to be recombined into one, 
more efficiently than the process of generating a double mutation. Similarly, 
deleterious mutations can be eliminated more efficiently from a population 
by having them recombined into a single genome thus allowing selection to 
eliminate them more efficiently. We will consider these arguments in the 
context of our two locus system. From Equations (fit)]) and (TIT]) we have 

A^(0=(-^-l)-flA*(f) (18) 

m = (P-P)-pcY,f«A*(t) (19) 

There are two regimes of interest related to Muller's ratchet, one is that 
advantageous mutations appear in a population and the second that dele- 
terious mutations appear. The question is: How does recombination affect 
the dynamics of these mutants? Considering the first case, we will model 
it in the present framework by imagining that the double mutant genotype 
11 is the fittest, with the single mutants 01 and 10 being less fit than the 
double mutant but fitter than the wild type 00. If we consider the popula- 
tion to be such that the fit double mutant is absent, i.e., Pu{t) = 00 then 
A n = (Pd(t)^o(t) - P{ (t)P^t)) = -P{ (t)P^(t) < 0. So 

A Pll (t) = (M- - lj P 11 (t)+p c P{ (t)P( )1 (t) (20) 

*/(*) = (P ~ P) + Pdfn - foi ~ fio + f 00 )P[ (t)Poi(t). (21) 

From Equation ([20]) we see that the number of fit double mutants increases 
from generation t to generation t + 1 due to the effect of recombination 



8 A good, although somewhat dated, review of the different potential mechanisms, and 
in particular Muller's ratchet, by which recombination can be beneficial can be found in 

!■ 

9 In this case there is an initial linkage disequilibrium, i.e., 
(Pn(t)Poa{t) - Pw{t)P i(t)) ^0. 
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relative to selection only dynamics. This is, in fact, independent of the 
fitness landscape, being associated with the search regime of recombination 
alluded to in section 12.11 In contrast, in Equation ( 121]) . we see that the 
average population fitness will increase in the presence of recombination if 
and only if = (fu — / i — fio + foo) > 0, which is a direct measure of the 
degree of (additive) epistasis between the two loci. Interestingly, for a purely 
additive landscape, = and so recombination is neutral in this setting. 
For the other genotypes we have the fraction of wild types increases due to 
the effect of recombination, while the frequency of single mutants decreases. 
What happens in the case where P\i(t) ^ will be considered in section [5] 
as the benefit from recombination then depends on the actual population as 
well as the landscape. 

Turning now to the case of deleterious mutants: in this case we take the 
wild type to be the genotype 11 and the types 01 and 10 to be deleterious 
single mutants and 00 to be an even more deleterious double mutant. In this 
case, just as for beneficial mutants, A n = (P 1 ' 1 (t)PQ (t) — P 1 / (t)PQ 1 (t)) = 
— P 1 ' (t)PQ 1 (t) < and hence the proportion of optimal wild types 11 in- 
creases. In terms of average population fitness, the increase from generation 
t to t + 1 is given by Equation (I2~T|) . In other words the change in average 
population fitness per generation for the case of beneficial versus deleterious 
mutations is identical if we are considering the same fitness landscape. 

4-1.2. Asymptotic behavior of A 

Before going on to consider the full numerical solution of the two-locus 
model we will consider what can be said analytically about the asymptotic 
behavior of the system. The full parameter set which controls the dynamics is 
Pi(0), Poo(0), Pio(0), Poi(0), /io, foi, in, foo and p c . Due to the constraint 
-Pii(O) + Poo(0) + Pio(0) + Poi(0) = 1 one of the genotypic frequencies can be 
eliminated. Although there are still 8 parameters that control the dynamics, 
the asymptotic behavior can be most naturally written in terms of just two 
parameters 

C(t) S (22) 

-'10-' 01 

where, for brevity, we use pj for Pj(t), and 

A = ^1. (23) 
/n/oo 
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The one generation evolution equation for C(t) is 

Pu(t + l)Poo(* + l) {P{ x -pA){Pw-pA) 



C(t+1) 



P w (t + l)P 01 (t + 1) (P{ +p c A)(P^ + Pc A) 
^-p c (Pl 1+ P> )+plA 



(24) 



Without loss of generality we choose I — 11 to be the optimal geno- 
type. The evolution of the genotype frequencies, Pij, as given by equation 
([[]), ensures the eventual dominance of the optimal genotype, i.e., P\\ — > 1, 
Poo, Pio, Poi — > with t. We suppose a priori that the limit 

Coo = lim C(t) (25) 



t— >oo 



exists, which in turn implies that 

pi pi n 
lim fJlfM = (26 ) 

and 

P' P' A 
hm ^pi = — 1- (27) 

t-Hx> A Coo - A 

With these elements in hand we can calculate the putative limit of equa- 
tion (fJU) to find: 

Coo 



Coo = Ca °~ A A (28) 

Coo— A 

Solving this last equation for Coo we obtain: 

P' P' 

Finally, since A = P^Ppo — P[qPqi an d C = p" P °° , we note that the nega- 
tivity of A is equivalent to the condition 

^-fep^pi - A -p. + A-^ 1 ' (30) 

which reduces to A > 1 for p c ^ 0. So, we can see that the asymptotic benefit 
of recombination in terms of increasing the fraction of optimal genotypes 



P. 
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relative to selection only, is determined by only 2 parameters - A and p c and 
is independent of the initial population. 

With this formula in hand, we can easily map any fitness landscape to a 
range of values for A and thus determine if recombination will be asymptot- 
ically favorable for that particular landscape. Explicitly, if we consider the 
general parametrized two-locus two allele landscape 

f = a + bi%i + b 2 x 2 + cxix 2 (31) 

we have 

A = + (* + **) (32) 
a{a + b 1 + b 2 + c) 

where c is the measure of epistasis between the two loci0 To fix some 
intuition we can think of the genotype I = 00 as the wild type, the genotypes 
/ = 01 and 10 as single mutants and I — 11 as a double mutant which is 
the optimal genotype. To simplify further the visualization of the asymptotic 
behavior, we will assume that b = 61 = b 2 , i.e., that the two mutants have the 
same fitness. Given that 11 is the optimal genotype we have that f u > / 10 , 
fu > foi an d fu > foo which implies that 2b + c > and b + c > so there 
is a limit to how negative the epistasis between the loci may be. We group 
the results into two sets for fixed values of b; one "low" (b = 0.1), and one 
medium "medium" (b = 0.8). 

Small values of b relative to c correspond to highly epistatic landscapes, 
while an additive landscape with no epistasis has c = 0. In this case A > 1 
and so recombination is asymptotically beneficial. Large values of a relative 
to b and c correspond to a more neutral fitness landscape, where selection 
effects are small. For a multiplicative landscape ac = b 2 and, hence, A — 1. 
In this case, recombination is asymptotically neutral. The dependence of the 
parameter A (= j°^° ) for these three particular values of b as a function of 
a and c is shown in the next three graphs: Values of A greater than 1 mean 
that the iterates must eventually reach negative values of A. The sign of 
A is then conserved, although the magnitude approaches zero as the system 
reaches linkage equilibrium associated with a population dominated by the 
optimum genotype. The opposite happens when A < 1. Note that the locus 
defined by the intersection of the surfaces A(a, c) and A = is given by 



10 Note that here, in contrast to some earlier works, we define epistasis relative to the 
additive limit not the multiplicative one. 
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b 2 = ac and corresponds to the case of multiplicative landscapes. Thus, for 
these landscapes recombination is asymptotically neutral. 



A vs a and c for b=.1 




Figure 2: A(a,c) = fey^ f° r b = 0.1. The solid plane, A = 1, separates those fitness 
landscapes that according to Eg 1301 will eventually benefit from recombination from those 
that don't. 

5. Exact Numerical Results 

Turning now to the non-asymptotic behavior, we performed an exhaustive 
numerical exploration of the 8 dimensional parameter space of the two-locus, 
two-allele system to determine under which conditions recombination is ben- 
eficial in terms of our two metrics (j!8p and (j!9p and as characterized by the 
SWLD coefficient. In such a high dimensional space, visualization of the 
resulting graphs requires separation into several distinct cases. 

5.1. Recombination as a function of fitness landscape 

We first consider graphs for arbitrary fitness landscapes but for a fixed 
initial population, with a further subdivision into cases made according to 
the type of initial population. Two kinds of graphs are provided, one that 
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A vs a and c for b=.8 




Figure 3: A(a,c) — for b — 0.8. The solid plane, A = 1, separates those fitness 

landscapes that according to Eq|30]will eventually benefit from recombination from those 
that don't. 



displays the value of the SWLD coefficient in layers, each corresponding 
to a different generation, and another that displays Aj (Afitness), defined 
as the change in average fitness between generation t and generation t + 1 
in a population evolving with both selection and recombination minus the 
change in average fitness of the same population but evolving with selection 
only. This enables us to determine that contribution that is purely due to 
recombination. 

^ (*\ — ~7 _~f (oo\ 

f\J J With recombination J Without Recombination' \ / 

thus, if Aj(t) is positive then recombination proves to be beneficial for that 
particular circumstance. In the graphs we show four representative time 
slices - 8, 16, 24 and 32 generations after the initial one. The plane An = 
that separates the recombination advantageous/ disadvantageous regimes is 
displayed (turquoise in the online version). For a given generation, those 
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values of a and c where An < 1 are shaded in red (below the A n = 
plane), while those where An > 1 correspond to a darker shading (above the 
An = plane). 

5.1.1. Initial Population Poo ~ 1 

In this first case we consider the dynamics when the initial popula- 
tion is dominated by the non-optimal wild type 00, with Poo(0) = 0.95, 
P 01 (0) = 0.025, Pio(0) = 0.0249, P n (0) = 0.0001. So, we are here interested 
in the effects of recombination on the dynamics of favourable mutations as 
a function of the fitness landscape and in the background of an initial popu- 
lation dominated by a non-optimal wild type. We fix b = 0.8 and study the 
variation in A as a function of a and c, remembering the restrictions 2b+c > 
and b + c > 0, so that c > —0.8. The most notable feature of @]is that neg- 
ative values of A are most associated with additive or negatively epistatic 
landscapes. Note that earlier in the evolution, t — 8, the benefits of recombi- 
nation are clear to see, even for quite positively epistatic interactions. This, 
however, is partially due to the this region being still in the search regime, as 
the initial frequency of optimal genotypes was so low. Gradually, the pop- 
ulation moves away from the search regime and enters the modular regime, 
where we see that it is only for landscapes that are either weakly positively 
epistatic, additive or negatively epistatic that recombination is beneficial. 

Turning now to the graphs of the change in average fitness of the popu- 
lation; at t — 8, in the search regime, we see that recombination leads to an 
increase in average population fitness over and above that of selection only 
for basically all landscapes. This is due to the addition of optimal genotypes 
in an initial population dominated by the non-optimal wild type. Gradually, 
however the effect of recombination diminishes as one enters the modular 
regime so that for positively epistatic landscapes the difference between se- 
lection only and recombinative dynamics is minimal. However, we note that 
there is still a strong pronounced effect for either weakly positively epistatic, 
additive or weakly negatively epistatic landscapes. 

So, how do we interpret these results in terms of BBs? Both in the search 
and modular regimes the advantage of recombination is associated with the 
fact that BBs of the optimal genotype, 1* and *1, are recombined to form the 
type 11. As the graphs show, this recombination of BBs is, in fact, a more 
efficient process in generating optimal types and increasing overall popula- 
tion fitness than selection alone for weakly epistatic landscapes. In fact, the 
benefit in the search regime is actually relatively independent of the degree of 
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Figure 4: Value of A at different generations for two-locus two-allele system as a function 
of fitness landscape, characterized by a and c. The initial population is Poo(0) — 0.95, 
Poi(0) = 0.025, Pio(0) = 0.0249, Pn(0) = 0.0001. The A = plane has been marked to 
distinguish between conditions in which recombination is favorable (A < 0) or not. The 
curve on the plane is ac = b 2 , the condition for a multiplicative landscape. 

epistasis of the landscape. Later on though, in the modular regime, the gen- 
eration of optimal genotypes by recombining optimal BBs competes against 
the their generation by pure selection effects. For positively epistatic land- 
scapes, once there are enough optimal types selection can produce new ones 
as or more efficiently than recombination. For modular landscapes however, 
recombination retains it's advantage. Indeed, this is, in fact, what charac- 
terizes the modular regime, i.e., that weakly epistatic BBs or modules are 
juxtaposed by recombination into even fitter genotypes leading to a faster 
evolution and a faster increase in average population fitness. 
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Figure 5: Value of at different generations for the two- locus two-allele system as a 
function of fitness landscape, characterized by a and c. The initial population is Poo(0) = 
0.95, Poi(0) = 0.025, Pio(0) = 0.0249, P u (0) = 0.0001. The A f - = plane has been 
marked to distinguish between conditions in which recombination is favorable (Aj > 0) 
or not. 



5.1.2. Initial Population P u m 1 

We now turn to the case where the initial population is dominated by the 
optimal genotype as the wild type with the presence of genotypes with a single 
deleterious mutation and a small proportion of deleterious double mutant 
genotypes. Specifically, P n (0) = 0.90, P lo (0) = 0.05, P 01 (0) = 0.049 and 
-foo(O) — 0.001. The question now is: What is the dynamics of the deleterious 
mutations in the population as a function of the landscape parameters? Once 
again, we fix b = 0.8 and study the variation in A as a function of a and c, 

In Figure[6]the first thing to notice is that, in distinction to the case where 
the initial population is dominated by the non-optimal genotype, here there 
is no behavior associated with the search regime, as the optimal genotype is 
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Figure 6: Value of A at different generations for two-locus two-allele system as a function 
of fitness landscape, characterized by a and c. The initial population is Pn(0) = 0.90, 
Pio(0) = 0.05, Poi(0) = 0.049 and P oo (0) = 0.001. The A = plane has been marked to 
distinguish between conditions in which recombination is favorable (A < 0) or not. The 
curve on the plane is ac = b 2 , the condition for a multiplicative landscape. 



already dominant in the population. Thus, for positively epistatic landscapes 
the difference due to recombination is small. However, for additive or nega- 
tively epistatic landscapes we see that recombination is advantageous, with 
the advantage being more significant in the presence of negative epistasis. 

Considering now the average population fitness, we see clearly in Figure 
0how the advantage of recombination manifests itself in the modular regime 
where epistasis is weak. Interestingly, we see how negatively epistatic land- 
scapes are, in the early part of the evolution, associated with Aj < 0. This 
is due to the fact that for negative epistasis the overall contribution to the 
population fitness of a deleterious double mutant and an optimal genotype is 
less than that of two types each with a single deleterious mutant. However, 
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Figure 7: Value of Ay at different generations for the two- locus two-allele system as a 
function of fitness landscape, characterized by a and c. The initial population is Pn(0) = 
0.90, Pio(0) = 0.05, Poi(0) = 0.049 and P 00 (0) = 0.001. The Ajr = plane has been 
marked to distinguish between conditions in which recombination is favorable (Ay > 0) 
or not. 

after creating the less fit double mutant, selection can eliminate the muta- 
tions thereby purifying the population more efficiently than selection alone. 
The more modular the landscape the more efficient this process becomes. 

5.1.3. Initial Population P u ^ 0, P o & \, Pqi ~ P\o ~ \ 

We now consider a scenario similar to that of sub-section 15.1.1} where 
the initial proportion of optimal genotypes is very small; but now, however, 
the frequency of the BBs, 1* and *1, represented by the beneficial mutants 
01 and 10, relative to the less fit wild type 00 is much higher. Concretely, 
the initial population is: P n (0) = 0.0001, P 10 (0) = 0.2499, P 01 (0) = 0.25 
and -Poo(O) = 0.5 so that the BBs 1* and *1 form about a quarter of the 
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population each one. 




Figure 8: Value of A at different generations for two-locus two-allele system as a function 
of fitness landscape, characterized by a and c. The initial population is -Poo(O) = 0.5, 
Poi(0) = 0.25, Pio(0) = 0.2499, P n (0) = 0.0001. The A = plane has been marked to 
distinguish between conditions in which recombination is favorable (A < 0) or not. The 
curve on the plane is ac = b 2 , the condition for a multiplicative landscape. 

We see in Figure [8] that the graphs are qualitatively similar to those 
of Figure HJ The chief difference now is that recombination is even more 
advantageous in the search regime than before. This is due to the wider 
availability of the BBs 1* and *1 thus facilitating the construction of the 
optimal type 11. In fact, we see that at t — 8 recombination is favorable for all 
landscapes within the parameter range considered. As evolution progresses, 
as before, we see a passage from the search regime to the modular regime, 
where the relative benefit of recombination is restricted to weakly positively 
epistatic, additive or weakly negatively epistatic landscapes. 

Similarly, in Figure |9] we see a similarity with the corresponding graphs 
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Figure 9: Value of at different generations for the two-locus two-allele system as a 
function of fitness landscape, characterized by a and c. The initial population is Pn(0) = 
0.0001, Pio(0) = 0.2499, P O i(0) = 0.25 and P O o(0) = 0.5. The A f - = plane has been 
marked to distinguish between conditions in which recombination is favorable (Aj > 0) 
or not. 

of Figure |5] the average population fitness showing a strong increase at t = 
8, relative to the selection only case, due to the efficient formation of the 
optimal type which in its turn is due to the large number of BBs in the 
population. Even for strongly epistatic landscapes there is a strong benefit 
to recombination in this regime. At later times, in the modular regime, we 
see that the advantage of recombination is associated with additive or weakly 
epistatic landscapes, i.e., modular landscapes. 

So, we see that the principle effect of increasing the BB frequency in the 
initial population is to accelerate the rate of evolution so that the frequency 
of the optimal genotype and the average population frequency increase more 
rapidly. 



26 



5.1.4- Initial Population P n rs 0, Poo ~ 

We now look at an even more extreme case, where the initial popula- 
tion is completely dominated by the single mutants 01 and 10 with the ini- 
tial population being P n (0) = 0.0001, P w (0) = 0.4998, P i(0) = 0.5 and 
-fbo(O) = 0.0001. Qualitatively the results are as in sub-sections 15.1.31 and 
I5.1.1| the strong presence of the BBs 1* and *1 leading to a very efficient 
production of the optimal genotype 11. This is, in fact, another good illustra- 
tion of Muller's ratchet. Although recombination leads to the generation of 
otpimal genotypes it also leads to the production of the sub-optimal double 
mutants 00. The latter, however, as the graphs clearly show, are flushed out 
by selection. In fact, as Figure ITU1 shows, they are produced and then flushed 
out most efficiently in the presence of recombination for modular landscapes 
when compared to selection only. 

5.1.5. Initial Homogeneous Population Py = 0.25 

The final initial population type we will consider is that of a uniform 
initial population where all genotypes have the same initial frequency, 0.25. 
Here we see behaviour that is qualitatively similar to that found for other 
populations. The chief difference here is that given the ample presence of 
the optimal genotype in the initial population there is no search regime and 
so the dynamics begins and remains in the modular regime. This is clear 
from the fact that, both for A and Aj, the benefits of recombination are 
present for modular landscapes, i.e, those with additive or weakly epistatic 
interactions. 

5.2. Recombination as a function of population 

Having explored the effect of recombination on the space of fitness land- 
scapes, by varying continuously the landscape parameters a and c for a vari- 
ety of distinct initial populations, we now consider the complementary view- 
point of considering how the effect of recombination varies by varying con- 
tinuously the initial population for a variety of fixed fitness landscapes. Due 
to the conservation of probability, the population vector is characterized by 
only three frequencies. For simplicity of visualization we will consider intitial 
populations such that Poi(0) = Pio(0) and consider the population dynamics 
as a function of Pn(0) and Poi(0). 

A general observation on all the graphs in this section is that since there 
is generic convergence to the optimal genotype Pn = 1 so clearly all the 
surfaces have A = in the Pn = 1 corner. 
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Figure 10: Value of A at different generations for two-locus two-allele system as a function 
of fitness landscape, characterized by a and c. The initial population is Pn(0) = 0.0001, 
Pio(0) = 0.4998, Poi(0) = 0.5 and P oo (0) = 0.0001. The A = plane has been marked to 
distinguish between conditions in which recombination is favorable (A < 0) or not. The 
curve on the plane is etc = b 2 , the condition for a multiplicative landscape. 



5.2.1. Additive landscape a = c = (A — oo). 

The first landscape we will consider is an additive landscape (c = 0), 
where the fitness of the non- mutant genotype 00 is zero (a = 0). For this 
landscape the tendency is clear, that the more BBs and the fewer optimal 
types there are, the more recombination helps. This is a manifestion of the 
search regime. Note that in this landscape, recombination in terms of A 
is never unfavorable, for any values of the initial population. However, we 
can see that the SWLD increases in time, approaching zero asymptotically, 
this regime being associated with the approach to a population completely 
dominated by the optimal genotype. 
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Figure 11: Value of A j at different generations for the two- locus two-allele system as a 
function of fitness landscape, characterized by a and c. The initial population is Pn(0) = 
0.0001, Pio(0) = 0.4998, P O i(0) = 0.5 and P O o(0) = 0.0001. The A f = plane has been 
marked to distinguish between conditions in which recombination is favorable (Aj > 0) 
or not. 



5.2.2. Neutral landscape: b\ = 6 2 = c = 0, a ^ (A = 1) 

For a neutral landscape, where the effects of selection are null, as with 
the additive landscape, the "the more building blocks the better recombina- 
tion is" rule is valid, but we see a different behavior as a function of initial 
population. For neutral evolution, the SWLD, A, and the standard linkage 
disequilibrium coefficient, D, are in fact the same. So, Figure [151 shows the 
approach to the Geiringer or Robbins manifold, defined by D = 0. The 
approach to this manifold is from the negative or positive side depending 
on whether the initial population is dominated by the BBs 01 and 10, or 
by the optimal genotype 11. The Geiringer limit has been amply studied in 
the literature 27] • Thus, recombination is beneficial when there is an ample 
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Figure 12: Value of A at different generations for two-locus two-allele system as a function 
of fitness landscape, characterized by a and c. The initial population is -Py(O) = 0.25. The 
A = plane has been marked to distinguish between conditions in which recombination 
is favorable (A < 0) or not. The curve on the plane is ac — b 2 , the condition for a 
multiplicative landscape. 

supply of BBs and few optimal types, and deleterious when there are no BBs. 
The minimal value of A is for -Poi(O) = 0.5 and the maximal for Poi(0) = 0, 
Poo(0) = Pu(0) = 0.5. 

5.2.3. Multiplicative landscape a = j8, b\ = 5 (a — 7), 62 = j(P — 6), 

c = a (3 + 75 - a5 - £7 with a = 10, (3 = 9, 7 = 2, 5 = 1 (A=l). 
We now turn to the case of a multiplicative landscape, where the allele 
fitnesses are taken to be /q = 7 and f\ = a for the first locus and f$ = 5 
and fl = (3 for the second locus. This landscape satisfies the multiplicative 
constraint that ac = b 2 . Here we see that recombination is favorable in 
the search regime where the BB frequency is high and the frequency of the 
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Figure 13: Value of A j at different generations for the two-locus two-allele system as a 
function of fitness landscape, characterized by a and c. The initial population is Pij (0) = 
0.25. The Ay = plane has been marked to distinguish between conditions in which 
recombination is favorable (Aj > 0) or not. 

optimal genotype is low. However, for other than very small P\\ we can 
see that recombination is somewhat unfavorable when the BB frequency is 
relatively low but, in the main, it is generally neutral in its effects. This is 
consistent with known results for multiplicative landscapes. In fact, viewing 
the time evolution, even if one starts in the search regime we see that very 
quickly the system approaches linkage equilibrium. 

5.2.4. Needle-In-A-Haystack, h x = b 2 = 0, c ^ 0, a ^ (A = ^-J 

We now turn to the final fixed landscape we will consider, that of NIAH, 
which has been used extensively in models of molecular evolution and, es- 
pecially, in considerations of selection-mutation balance and the existence 
of error thresholds. As a function of the initial population we can clearly 
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Figure 14: Value of A at different time steps for a two-locus two-allele system with an 
additve fitness landscape a = c = 0) for different values of the initial population given by 
Pn and Pio(= Poi)- The A = plane has been marked to distinguish between conditions 
in which recombination is favorable (A < 0) or not. 



see that in the search regime, where there is an ample supply of BBs and 
only a zero or small proportion of the optimal genotype, that recombination 
is favorable, both in terms of leading to a more efficient production of the 
optimal genotype when compared to selection only (A < 0) as well as a more 
fit population (Aj > 0. On the other hand, away from the search regime 
it is clear that the effects of recombination are unfavorable. Note that the 
advantage or disadvantage of recombination decreases in time as the system 
gets closer to linkage equilibrium, this equilibrium being associated with a 
population dominated by the optimal genotype. 
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Figure 15: Value of A at different time steps for a two-locus two-allele system with a 
neutral (61 = 62 = c = 0, a ^ 0) fitness landscape for different values of the initial 
population given by -Pn(O) and Pio(0) = Pqi(0). The A = plane has been marked to 
distinguish between conditions in which recombination is favorable (A < 0) or not. 

6. Conclusion 

As discussed in the introduction, genetic recombination remains a puzzle 
as far as having a full, intuitive understanding of why it is so prevalent, with 
no generally accepted explanation of its benefits. Many theoretical analyses 
have been performed. The vast majority of these have been in the context 
of variations on a theme of standard population genetics models - haploid, 
diploid, with modifer genes, without modifier genes, with finite population, 
with infinite population, with mutation, without mutation, with few loci, 
with many loci, with different fitness landscapes, with different population 
states etc. Of course, to understand the benefits of recombination in the 



33 




Figure 16: Value of A at different time steps for a two-locus two-allele system with a 
multiplicative fitness landscape, where (a = jS, b\ = 8(a — 7), 62 = 7(/5 — <5),c = af3 + j5 — 
ad — /?7, with a = 10, /3 = 9, 7 = 2, 6 — 1) for different values of the initial population 
given by Pn and Pio(= Poi)- The A = plane has been marked to distinguish between 
conditions in which recombination is favorable (A < 0) or not. 



context of a mathematical model, the model itself must contain a description 
of the mechanisms that explain why it is useful in the first place. The question 
is then: do the benefits lie outside the context of the models that have been 
studied, or are they hidden within the results of these models? If the former 
is true, then one must formulate a new model, with new features, which 
will then make manifest its utility. On the other hand, if the latter is the 
case, then it is important to have a model that can be studied exhaustively, 
in that there is no region of the parameter space of the model that remains 
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Figure 17: Value ol A at different generations for a two-locus two-allele system with a 
"Needle in a haystack" fitness landscape (61 = 62 = 0, c = 0.001, a = 1) for different 
values of the initial population given by P\\ and Pio(= Poi)- The A = plane has been 
marked to distinguish between conditions in which recombination is favorable (A < 0) or 
not. 



unexplored. Additionally, the model should be such that the effective degrees 
of freedom of the underlying system are manifest. 



Previous work [35], both analytical and numerical, has hinted at the fact 
that recombination seems to be especially useful in the context of additive 
landscapes. However, these analyses did not cover the full parameter space 
of the considered models, and so there is always doubt that the specific land- 
scape or the specific initial population considered were not representative 
and therefore any identified benefits of recombination were not "universal" 
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Figure 18: Value of A i at different generations for a two-locus two-allele system with a 
"Needle in a haystack" fitness landscape (pi = 62 = 0, c = 0.001, a = 1) for different 
values of the initial population given by Pu and Pio(= Pqi)- The Aj = plane has been 
marked to distinguish between conditions in which recombination is favorable (Aj > 0) 
or not. 



but, rather, tied to the specific scenario considered. To counter these argu- 
ments, in this paper, we have taken the route of fixing a simple model - a 
two locus, two allele system of haploid sequences with non-overlapping gen- 
erations evolving in the presence of selection and homologous recombination 
- but have analysed the full parameter space of the model. This corresponds 
to three population variables and four landscape parameters. Having fixed 
the model, we can begin to look for the regions of parameter space, if any, 
in which recombination is beneficial. Of course, we first have to define what 
we mean by "beneficial". In this paper we fixed two metrics: one was the 
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SWLD coefficient for the optimal genotype that measures the excess produc- 
tion of such types over and above that which is produced by selection only; 
and the other is the increase in average population fitness over and above 
that which would be produced by selection only. With these two metrics 
we measure the benefits of recombination in terms of its capacity to lead 
to higher proportions of fitter genotypes and fitter populations relative to 
selection only. 

So, what does our analysis of the parameter space of this model tell us? 
The analyses we have carried out are completely consistent with the previous 
results of 37| showing that there are two important, but distinct, regimes 
in which recombination is beneficial in terms of both the metrics that we 
have used to characterize its benefits. The first of these is the search regime, 
which is associated with conditions where the fittest genotype is either not 
present or only at low frequency. In this regime recombination is of benefit 
independently of the fitness landscape. However, exactly how beneficial it is 
does depend on the landscape. The more positive epistasis that is present 
the less the benefit, both in terms of the excess production of fitter types 
and average population fitness relative to selection only. This effect is, in 
fact, intimately related to the main result of this paper - that the benefits 
of recombination, other than in the search regime, are manifest in fitness 
landscapes that are "modular" , by which we mean that they are only weakly 
epistatic, with purely additive landscapes being the extreme form. This 
second, landscape dependent, regime in which recombination is beneficial we 
term the modular regime. 

We believe that the results of this paper unite two important threads of 
evolutionary thought - the ubiquity of genetic recombination and the ubiquity 
of modularity. This paper is not the appropriate forum in which to discuss 
the reasons why modularity is so important. There are many papers on the 
subject. However, it is amazing that the benefits of recombination seem to 
be so intimately tied to this phenomenon, at least in the framework of the 
fitness landscape paradigm as discussed here. This leads, indeed, to another 
evolutionary "chicken and egg" puzzle. Did recombination evolve to take 
advantage of the existence of modularity or vice versa? We would posit 
that there has been a strong co-evolutionary link between the two since the 
beginnings of life. 

So, what are weak points of our model and analysis? Well, first of all one 
could criticize the simplicity of the model, although the model shares many 
features with previous analyses. The fact that only two loci are considered is 
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the price we pay for being able to consider the full parameter space. However, 
its worth mentioning again that these "loci" could represent different levels 
of description from, in principle, nucleotides up to entire sets of genes. Our 
other restriction is that we can describe each locus in terms of two possible 
states. We are quite sure that no qualitative effect that we have observed here 
depends on the existence of only two alleles. The question is: are the effects 
we see and the conclusions we make from the two locus model generalizable 
to multi-loci models? Unfortunately, we cannot analyse exhaustively the full 
parameter space of a multilocus model. For i loci there are, in principle, 
2—1 population parameters and 2 e landscape parameters to contend with. 

Previously 38j, we have performed analyses with multiple loci, investi- 
gating numerically the dynamics for certain specific landscapes and initial 
populations. The results seen there are completely consistent with what we 
observe in full generality in this paper, i.e., that the benefits of recombina- 
tion when not in the search regime are manifest in modular landscapes while, 
on the contrary, it is detrimental in the presence of high epistasis. In this 
paper we have also neglected the effects of mutation, whereas much previous 
work has been associated with studying how recombination interacts with 
mutation by positing Muller's ratchet type regimes where the dynamics of 
beneficial or detrimental single mutations are considered in the presence of 
recombination. It is an important question to understand the relative bene- 
fits of mutation versus recombination in the context of the metrics that we 
have considered here. We will, indeed, return to that in a separate paper. 
However, it is first important to understand what benefits there are that are 
intrinsic to recombination without a comparison with mutation. 

Finally, we have also restricted attention here to fixed-length sequences. 
We believe that the relation between recombination and modularity ex- 
tends beyond this restriction, applying also to variable-length sequences and 
recombination-like genetic operators other than homologous recombination. 
For instance, unequal crossing over or gene duplication. 
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